Dataset statistics
| Number of variables | 12 |
|---|---|
| Number of observations | 135397 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 12.4 MiB |
| Average record size in memory | 96.0 B |
Variable types
| NUM | 10 |
|---|---|
| CAT | 2 |
Reproduction
| Analysis started | 2020-04-06 12:49:09.061681 |
|---|---|
| Analysis finished | 2020-04-06 12:59:20.524814 |
| Version | pandas-profiling v2.5.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
| Distinct count | 135397 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 270462.1444 |
|---|---|
| Minimum | 13 |
| Maximum | 539752 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.0 MiB |
Quantile statistics
| Minimum | 13 |
|---|---|
| 5-th percentile | 28393.6 |
| Q1 | 136298 |
| median | 268962 |
| Q3 | 405372 |
| 95-th percentile | 513016.2 |
| Maximum | 539752 |
| Range | 539739 |
| Interquartile range (IQR) | 269074 |
Descriptive statistics
| Standard deviation | 155804.7609 |
|---|---|
| Coefficient of variation (CV) | 0.5760686444 |
| Kurtosis | -1.200028544 |
| Mean | 270462.1444 |
| Median Absolute Deviation (MAD) | 134906.0336 |
| Skewness | 0.0101182272 |
| Sum | 3.661976296e+10 |
| Variance | 2.427512351e+10 |
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.300000e+01 9.545000e+02 3.701500e+03 4.028500e+03 6.405500e+03 ... 5.341785e+05 5.368050e+05 5.383260e+05 5.391885e+05 5.397520e+05], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 264191 | 1 | < 0.1% | |
| 116242 | 1 | < 0.1% | |
| 161252 | 1 | < 0.1% | |
| 400873 | 1 | < 0.1% | |
| 407020 | 1 | < 0.1% | |
| 409069 | 1 | < 0.1% | |
| 495966 | 1 | < 0.1% | |
| 443890 | 1 | < 0.1% | |
| 358830 | 1 | < 0.1% | |
| 189942 | 1 | < 0.1% | |
| Other values (135387) | 135387 | > 99.9% |
| Value | Count | Frequency (%) | |
| 13 | 1 | < 0.1% | |
| 25 | 1 | < 0.1% | |
| 28 | 1 | < 0.1% | |
| 29 | 1 | < 0.1% | |
| 31 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 539752 | 1 | < 0.1% | |
| 539744 | 1 | < 0.1% | |
| 539735 | 1 | < 0.1% | |
| 539732 | 1 | < 0.1% | |
| 539724 | 1 | < 0.1% |
price
Real number (ℝ≥0)
| Distinct count | 5183 |
|---|---|
| Unique (%) | 3.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 11767.09613 |
|---|---|
| Minimum | 1001 |
| Maximum | 39999 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.0 MiB |
Quantile statistics
| Minimum | 1001 |
|---|---|
| 5-th percentile | 2500 |
| Q1 | 5250 |
| median | 8999 |
| Q3 | 16400 |
| 95-th percentile | 28990 |
| Maximum | 39999 |
| Range | 38998 |
| Interquartile range (IQR) | 11150 |
Descriptive statistics
| Standard deviation | 8368.352679 |
|---|---|
| Coefficient of variation (CV) | 0.7111654893 |
| Kurtosis | 0.5840741446 |
| Mean | 11767.09613 |
| Median Absolute Deviation (MAD) | 6732.04321 |
| Skewness | 1.096526073 |
| Sum | 1593229515 |
| Variance | 70029326.55 |
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1001. 1099.5 1105.5 1192.5 1199.5 ... 39990.5 39993. 39996. 39998.5 39999. ], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 3500 | 1755 | 1.3% | |
| 6995 | 1692 | 1.2% | |
| 4995 | 1664 | 1.2% | |
| 4500 | 1652 | 1.2% | |
| 5995 | 1590 | 1.2% | |
| 5500 | 1589 | 1.2% | |
| 7995 | 1587 | 1.2% | |
| 6500 | 1476 | 1.1% | |
| 3995 | 1467 | 1.1% | |
| 8995 | 1466 | 1.1% | |
| Other values (5173) | 119459 | 88.2% |
| Value | Count | Frequency (%) | |
| 1001 | 1 | < 0.1% | |
| 1005 | 1 | < 0.1% | |
| 1024 | 1 | < 0.1% | |
| 1028 | 1 | < 0.1% | |
| 1050 | 5 | < 0.1% |
| Value | Count | Frequency (%) | |
| 39999 | 34 | < 0.1% | |
| 39998 | 14 | < 0.1% | |
| 39997 | 4 | < 0.1% | |
| 39995 | 72 | 0.1% | |
| 39991 | 2 | < 0.1% |
year
Real number (ℝ≥0)
| Distinct count | 96 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2009.046618 |
|---|---|
| Minimum | 1908 |
| Maximum | 2021 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.0 MiB |
Quantile statistics
| Minimum | 1908 |
|---|---|
| 5-th percentile | 1998 |
| Q1 | 2006 |
| median | 2010 |
| Q3 | 2014 |
| 95-th percentile | 2017 |
| Maximum | 2021 |
| Range | 113 |
| Interquartile range (IQR) | 8 |
Descriptive statistics
| Standard deviation | 7.501392696 |
|---|---|
| Coefficient of variation (CV) | 0.003733807183 |
| Kurtosis | 18.13918395 |
| Mean | 2009.046618 |
| Median Absolute Deviation (MAD) | 5.131765936 |
| Skewness | -2.963946261 |
| Sum | 272018885 |
| Variance | 56.27089238 |
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1908. 1928.5 1941.5 1946.5 1961.5 ... 2017.5 2018.5 2019.5 2020.5 2021. ], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 2013 | 10079 | 7.4% | |
| 2012 | 9689 | 7.2% | |
| 2011 | 9171 | 6.8% | |
| 2008 | 9104 | 6.7% | |
| 2014 | 9082 | 6.7% | |
| 2015 | 8801 | 6.5% | |
| 2007 | 8094 | 6.0% | |
| 2010 | 7644 | 5.6% | |
| 2016 | 7567 | 5.6% | |
| 2006 | 7105 | 5.2% | |
| Other values (86) | 49061 | 36.2% |
| Value | Count | Frequency (%) | |
| 1908 | 3 | < 0.1% | |
| 1923 | 6 | < 0.1% | |
| 1925 | 1 | < 0.1% | |
| 1926 | 1 | < 0.1% | |
| 1927 | 2 | < 0.1% |
| Value | Count | Frequency (%) | |
| 2021 | 1 | < 0.1% | |
| 2020 | 132 | 0.1% | |
| 2019 | 2129 | 1.6% | |
| 2018 | 3509 | 2.6% | |
| 2017 | 6247 | 4.6% |
manufacturer
Real number (ℝ≥0)
| Distinct count | 41 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 19.20192471 |
|---|---|
| Minimum | 0 |
| Maximum | 42 |
| Zeros | 1163 |
| Zeros (%) | 0.9% |
| Memory size | 1.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 5 |
| Q1 | 10 |
| median | 14 |
| Q3 | 32 |
| 95-th percentile | 40 |
| Maximum | 42 |
| Range | 42 |
| Interquartile range (IQR) | 22 |
Descriptive statistics
| Standard deviation | 11.93364378 |
|---|---|
| Coefficient of variation (CV) | 0.6214816464 |
| Kurtosis | -1.039684824 |
| Mean | 19.20192471 |
| Median Absolute Deviation (MAD) | 10.31938783 |
| Skewness | 0.5597150933 |
| Sum | 2599883 |
| Variance | 142.411854 |
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 38.5 39.5 40.5 41.5 42. ], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 13 | 26261 | 19.4% | |
| 7 | 21166 | 15.6% | |
| 40 | 11189 | 8.3% | |
| 32 | 7698 | 5.7% | |
| 17 | 7481 | 5.5% | |
| 21 | 6254 | 4.6% | |
| 35 | 5989 | 4.4% | |
| 14 | 5986 | 4.4% | |
| 10 | 4904 | 3.6% | |
| 4 | 3606 | 2.7% | |
| Other values (31) | 34863 | 25.7% |
| Value | Count | Frequency (%) | |
| 0 | 1163 | 0.9% | |
| 1 | 30 | < 0.1% | |
| 2 | 1 | < 0.1% | |
| 3 | 1468 | 1.1% | |
| 4 | 3606 | 2.7% |
| Value | Count | Frequency (%) | |
| 42 | 923 | 0.7% | |
| 41 | 3060 | 2.3% | |
| 40 | 11189 | 8.3% | |
| 39 | 4 | < 0.1% | |
| 38 | 3402 | 2.5% |
| Distinct count | 6 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.138082823 |
|---|---|
| Minimum | 0 |
| Maximum | 5 |
| Zeros | 64710 |
| Zeros (%) | 47.8% |
| Memory size | 1.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1 |
| Q3 | 2 |
| 95-th percentile | 3 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.152550691 |
|---|---|
| Coefficient of variation (CV) | 1.012712491 |
| Kurtosis | -1.388710329 |
| Mean | 1.138082823 |
| Median Absolute Deviation (MAD) | 1.094926788 |
| Skewness | 0.2596369366 |
| Sum | 154093 |
| Variance | 1.328373095 |
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 0.5 1.5 2.5 3.5 5. ], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 0 | 64710 | 47.8% | |
| 2 | 51790 | 38.3% | |
| 3 | 14857 | 11.0% | |
| 1 | 3473 | 2.6% | |
| 4 | 366 | 0.3% | |
| 5 | 201 | 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 64710 | 47.8% | |
| 1 | 3473 | 2.6% | |
| 2 | 51790 | 38.3% | |
| 3 | 14857 | 11.0% | |
| 4 | 366 | 0.3% |
| Value | Count | Frequency (%) | |
| 5 | 201 | 0.1% | |
| 4 | 366 | 0.3% | |
| 3 | 14857 | 11.0% | |
| 2 | 51790 | 38.3% | |
| 1 | 3473 | 2.6% |
cylinders
Real number (ℝ≥0)
| Distinct count | 8 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.663810867 |
|---|---|
| Minimum | 0 |
| Maximum | 7 |
| Zeros | 916 |
| Zeros (%) | 0.7% |
| Memory size | 1.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 3 |
| median | 5 |
| Q3 | 6 |
| 95-th percentile | 6 |
| Maximum | 7 |
| Range | 7 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 1.265828633 |
|---|---|
| Coefficient of variation (CV) | 0.2714150872 |
| Kurtosis | -0.3984356884 |
| Mean | 4.663810867 |
| Median Absolute Deviation (MAD) | 1.085501421 |
| Skewness | -0.6598825671 |
| Sum | 631466 |
| Variance | 1.602322128 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 5 | 50032 | 37.0% | |
| 6 | 41993 | 31.0% | |
| 3 | 40725 | 30.1% | |
| 4 | 1286 | 0.9% | |
| 0 | 916 | 0.7% | |
| 7 | 238 | 0.2% | |
| 2 | 156 | 0.1% | |
| 1 | 51 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 916 | 0.7% | |
| 1 | 51 | < 0.1% | |
| 2 | 156 | 0.1% | |
| 3 | 40725 | 30.1% | |
| 4 | 1286 | 0.9% |
| Value | Count | Frequency (%) | |
| 7 | 238 | 0.2% | |
| 6 | 41993 | 31.0% | |
| 5 | 50032 | 37.0% | |
| 4 | 1286 | 0.9% | |
| 3 | 40725 | 30.1% |
| Distinct count | 5 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.891038945 |
|---|---|
| Minimum | 0 |
| Maximum | 4 |
| Zeros | 8902 |
| Zeros (%) | 6.6% |
| Memory size | 1.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 2 |
| median | 2 |
| Q3 | 2 |
| 95-th percentile | 2 |
| Maximum | 4 |
| Range | 4 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.5383426671 |
|---|---|
| Coefficient of variation (CV) | 0.2846808991 |
| Kurtosis | 8.904212474 |
| Mean | 1.891038945 |
| Median Absolute Deviation (MAD) | 0.2499516312 |
| Skewness | -2.328496096 |
| Sum | 256041 |
| Variance | 0.2898128272 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 2 | 124244 | 91.8% | |
| 0 | 8902 | 6.6% | |
| 3 | 1157 | 0.9% | |
| 4 | 996 | 0.7% | |
| 1 | 98 | 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 8902 | 6.6% | |
| 1 | 98 | 0.1% | |
| 2 | 124244 | 91.8% | |
| 3 | 1157 | 0.9% | |
| 4 | 996 | 0.7% |
| Value | Count | Frequency (%) | |
| 4 | 996 | 0.7% | |
| 3 | 1157 | 0.9% | |
| 2 | 124244 | 91.8% | |
| 1 | 98 | 0.1% | |
| 0 | 8902 | 6.6% |
| Distinct count | 303 |
|---|---|
| Unique (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 22.7956971 |
|---|---|
| Minimum | 0 |
| Maximum | 2000 |
| Zeros | 3219 |
| Zeros (%) | 2.4% |
| Memory size | 1.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 13 |
| median | 22 |
| Q3 | 30 |
| 95-th percentile | 43 |
| Maximum | 2000 |
| Range | 2000 |
| Interquartile range (IQR) | 17 |
Descriptive statistics
| Standard deviation | 26.2441299 |
|---|---|
| Coefficient of variation (CV) | 1.151275602 |
| Kurtosis | 2474.675635 |
| Mean | 22.7956971 |
| Median Absolute Deviation (MAD) | 10.55983543 |
| Skewness | 38.90228426 |
| Sum | 3086469 |
| Variance | 688.7543544 |
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.0000e+00 5.0000e-01 1.5000e+00 2.5000e+00 3.5000e+00 ... 4.0050e+02 5.1700e+02 7.5450e+02 1.9995e+03 2.0000e+03], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 19 | 4675 | 3.5% | |
| 20 | 4642 | 3.4% | |
| 24 | 4545 | 3.4% | |
| 22 | 4396 | 3.2% | |
| 18 | 4382 | 3.2% | |
| 23 | 4251 | 3.1% | |
| 25 | 4165 | 3.1% | |
| 21 | 4148 | 3.1% | |
| 26 | 4136 | 3.1% | |
| 16 | 4000 | 3.0% | |
| Other values (293) | 92057 | 68.0% |
| Value | Count | Frequency (%) | |
| 0 | 3219 | 2.4% | |
| 1 | 946 | 0.7% | |
| 2 | 1606 | 1.2% | |
| 3 | 1837 | 1.4% | |
| 4 | 2117 | 1.6% |
| Value | Count | Frequency (%) | |
| 2000 | 2 | < 0.1% | |
| 1999 | 5 | < 0.1% | |
| 1737 | 1 | < 0.1% | |
| 1668 | 2 | < 0.1% | |
| 1560 | 1 | < 0.1% |
transmission
Categorical
| Distinct count | 3 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.0 MiB |
| 0 | |
|---|---|
| 1 | 9317 |
| 2 | 5351 |
| Value | Count | Frequency (%) | |
| 0 | 120729 | 89.2% | |
| 1 | 9317 | 6.9% | |
| 2 | 5351 | 4.0% |
Length
| Max length | 1 |
|---|---|
| Mean length | 1 |
| Min length | 1 |
| Value | Count | Frequency (%) | |
| Decimal_Number | 3 | 100.0% |
| Value | Count | Frequency (%) | |
| Common | 3 | 100.0% |
| Value | Count | Frequency (%) | |
| ASCII | 3 | 100.0% |
drive
Categorical
| Distinct count | 3 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.0 MiB |
| 0 | |
|---|---|
| 1 | |
| 2 |
| Value | Count | Frequency (%) | |
| 0 | 60137 | 44.4% | |
| 1 | 45567 | 33.7% | |
| 2 | 29693 | 21.9% |
Length
| Max length | 1 |
|---|---|
| Mean length | 1 |
| Min length | 1 |
| Value | Count | Frequency (%) | |
| Decimal_Number | 3 | 100.0% |
| Value | Count | Frequency (%) | |
| Common | 3 | 100.0% |
| Value | Count | Frequency (%) | |
| ASCII | 3 | 100.0% |
| Distinct count | 13 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.144242487 |
|---|---|
| Minimum | 0 |
| Maximum | 12 |
| Zeros | 33237 |
| Zeros (%) | 24.5% |
| Memory size | 1.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 2 |
| median | 8 |
| Q3 | 9 |
| 95-th percentile | 11 |
| Maximum | 12 |
| Range | 12 |
| Interquartile range (IQR) | 7 |
Descriptive statistics
| Standard deviation | 4.143149038 |
|---|---|
| Coefficient of variation (CV) | 0.6743140504 |
| Kurtosis | -1.400031046 |
| Mean | 6.144242487 |
| Median Absolute Deviation (MAD) | 3.811965893 |
| Skewness | -0.4984001379 |
| Sum | 831912 |
| Variance | 17.16568395 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 9 | 34110 | 25.2% | |
| 0 | 33237 | 24.5% | |
| 10 | 21732 | 16.1% | |
| 8 | 15920 | 11.8% | |
| 3 | 8144 | 6.0% | |
| 4 | 4752 | 3.5% | |
| 11 | 4658 | 3.4% | |
| 12 | 3946 | 2.9% | |
| 2 | 3300 | 2.4% | |
| 5 | 3147 | 2.3% | |
| Other values (3) | 2451 | 1.8% |
| Value | Count | Frequency (%) | |
| 0 | 33237 | 24.5% | |
| 1 | 138 | 0.1% | |
| 2 | 3300 | 2.4% | |
| 3 | 8144 | 6.0% | |
| 4 | 4752 | 3.5% |
| Value | Count | Frequency (%) | |
| 12 | 3946 | 2.9% | |
| 11 | 4658 | 3.4% | |
| 10 | 21732 | 16.1% | |
| 9 | 34110 | 25.2% | |
| 8 | 15920 | 11.8% |
| Distinct count | 12 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5.635331654 |
|---|---|
| Minimum | 0 |
| Maximum | 11 |
| Zeros | 25229 |
| Zeros (%) | 18.6% |
| Memory size | 1.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 7 |
| Q3 | 9 |
| 95-th percentile | 10 |
| Maximum | 11 |
| Range | 11 |
| Interquartile range (IQR) | 8 |
Descriptive statistics
| Standard deviation | 3.985860394 |
|---|---|
| Coefficient of variation (CV) | 0.7072982813 |
| Kurtosis | -1.582309131 |
| Mean | 5.635331654 |
| Median Absolute Deviation (MAD) | 3.665926811 |
| Skewness | -0.2840279952 |
| Sum | 763007 |
| Variance | 15.88708308 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 10 | 32045 | 23.7% | |
| 0 | 25229 | 18.6% | |
| 9 | 20417 | 15.1% | |
| 5 | 15707 | 11.6% | |
| 1 | 14416 | 10.6% | |
| 8 | 14219 | 10.5% | |
| 4 | 4226 | 3.1% | |
| 2 | 3841 | 2.8% | |
| 3 | 3160 | 2.3% | |
| 11 | 967 | 0.7% | |
| Other values (2) | 1170 | 0.9% |
| Value | Count | Frequency (%) | |
| 0 | 25229 | 18.6% | |
| 1 | 14416 | 10.6% | |
| 2 | 3841 | 2.8% | |
| 3 | 3160 | 2.3% | |
| 4 | 4226 | 3.1% |
| Value | Count | Frequency (%) | |
| 11 | 967 | 0.7% | |
| 10 | 32045 | 23.7% | |
| 9 | 20417 | 15.1% | |
| 8 | 14219 | 10.5% | |
| 7 | 378 | 0.3% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
First rows
| df_index | price | year | manufacturer | condition | cylinders | fuel | odometer | transmission | drive | type | paint_color | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 13 | 7995 | 2010 | 7 | 0 | 6 | 2 | 38 | 0 | 0 | 10 | 10 |
| 1 | 25 | 4000 | 1995 | 10 | 0 | 6 | 2 | 26 | 0 | 0 | 10 | 5 |
| 2 | 28 | 16000 | 2011 | 4 | 0 | 5 | 2 | 17 | 0 | 1 | 9 | 5 |
| 3 | 29 | 10950 | 2011 | 5 | 0 | 5 | 2 | 8 | 0 | 1 | 9 | 8 |
| 4 | 31 | 9400 | 2011 | 4 | 2 | 5 | 2 | 29 | 0 | 0 | 0 | 1 |
| 5 | 35 | 4500 | 2012 | 13 | 0 | 5 | 2 | 31 | 0 | 0 | 9 | 9 |
| 6 | 40 | 1495 | 2004 | 18 | 2 | 5 | 2 | 39 | 0 | 0 | 0 | 8 |
| 7 | 42 | 2800 | 2002 | 32 | 3 | 5 | 2 | 38 | 0 | 0 | 0 | 9 |
| 8 | 48 | 25900 | 2008 | 13 | 0 | 6 | 2 | 14 | 0 | 2 | 10 | 10 |
| 9 | 62 | 4999 | 2007 | 19 | 2 | 5 | 2 | 22 | 0 | 2 | 9 | 3 |
Last rows
| df_index | price | year | manufacturer | condition | cylinders | fuel | odometer | transmission | drive | type | paint_color | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 135387 | 539701 | 27755 | 2011 | 13 | 0 | 6 | 0 | 31 | 0 | 0 | 8 | 8 |
| 135388 | 539702 | 22457 | 2008 | 35 | 2 | 5 | 0 | 44 | 0 | 0 | 0 | 8 |
| 135389 | 539711 | 13500 | 2014 | 32 | 0 | 3 | 2 | 13 | 0 | 0 | 0 | 5 |
| 135390 | 539712 | 2700 | 2002 | 17 | 5 | 3 | 2 | 36 | 0 | 1 | 9 | 10 |
| 135391 | 539714 | 3950 | 2009 | 18 | 0 | 3 | 2 | 25 | 0 | 1 | 9 | 1 |
| 135392 | 539724 | 8995 | 2007 | 24 | 0 | 5 | 2 | 37 | 0 | 0 | 9 | 0 |
| 135393 | 539732 | 9457 | 2008 | 25 | 2 | 6 | 2 | 38 | 0 | 0 | 0 | 10 |
| 135394 | 539735 | 7455 | 2013 | 40 | 2 | 3 | 2 | 27 | 0 | 1 | 9 | 0 |
| 135395 | 539744 | 6300 | 2014 | 32 | 2 | 3 | 2 | 17 | 0 | 1 | 9 | 5 |
| 135396 | 539752 | 5295 | 2006 | 3 | 0 | 3 | 2 | 30 | 0 | 0 | 12 | 3 |